Document Classification

نویسندگان

چکیده

Keywords can be used as attributes for mining rules or a basis measuring the similarity of new (unclassified) documents with existing (classified) ones. The focus is on problem extracting keywords from document collection in order to use them classification. Document classification hot topic machine learning. Typical approaches extract “features,” generally words, document, and feature vectors input learning scheme that learns how classify documents. This “bag keywords” model neglects keyword contextual effects.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Document Embedding Method for News Classification

Abstract- Text classification is one of the main tasks of natural language processing (NLP). In this task, documents are classified into pre-defined categories. There is lots of news spreading on the web. A text classifier can categorize news automatically and this facilitates and accelerates access to the news. The first step in text classification is to represent documents in a suitable way t...

متن کامل

Document Analysis And Classification Based On Passing Window

In this paper we present Document analysis and classification system to segment and classify contents of Arabic document images. This system includes preprocessing, document segmentation, feature extraction and document classification. A document image is enhanced in the preprocessing by removing noise, binarization, and detecting and correcting image skew. In document segmentation, an algorith...

متن کامل

Classification DOCUMENT CONTROL DATA

Field trials of the LTS-3 system at Keesler Air Force Base have been extended, and excellent results have been obtained with high-aptitude students, who had been excluded from earlier trials. A study of the use of the LTS for task simulation has led to the implementation of a new student response interpretation feature for the system. Design of the microfiche selector/reader breadboard for LTS-...

متن کامل

Tion for Document Classification

The bag-of-words (BOW) model is the common approach for classifying documents, where words are used as feature for training a classifier. This generally involves a huge number of features. Some techniques, such as Latent Semantic Analysis (LSA) or Latent Dirichlet Allocation (LDA), have been designed to summarize documents in a lower dimension with the least semantic information loss. Some sema...

متن کامل

Intelligent document classification

In this work we investigate some technical questions related to the application of neural networks in document classification. First, we discuss the effects of different averaging protocols for the 2 statistic used to remove non-informative terms. This is an especially relevant issue for the neural network technique, which requires an aggressive dimensionality reduction to be feasible. Second, ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Advances in data mining and database management book series

سال: 2021

ISSN: ['2327-199X', '2327-1981']

DOI: https://doi.org/10.4018/978-1-7998-3772-5.ch007